Acta Crystallographica Section D Structural Biology
● International Union of Crystallography (IUCr)
Preprints posted in the last 30 days, ranked by how well they match Acta Crystallographica Section D Structural Biology's content profile, based on 54 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Prester, A.; Spiliopoulou, M.; Schulz, E. C.
Show abstract
Accurate determination of state occupancies is essential for interpreting the structural heterogeneity inherent in time-resolved crystallography. However, in cases of high spatial overlap between states, as commonly observed in time-resolved crystallography data, the strong correlation between occupancy and atomic displacement parameters (ADPs) can render single point estimates from standard refinement protocols unreliable. We introduce MEROS (Multi-state Ensemble Refinement for Occupancy Statistics), a pipeline that implements an ensemble refinement approach to assess the post-refinement occupancy-ADP statistics of multiple overlapping states. MEROS utilizes a Monte Carlo sampling of the parameter space, performing independent refinements from randomized starting occupancies and ADP values to empirically characterize the convergence and uncertainty of the solution. The method is implemented as a modular Python pipeline that wraps established refinement programs, ensuring compatibility with existing workflows. We demonstrate its applicability in two case studies: a two-state ligand binding model in T4 lysozyme L99A and a four-state covalent catalysis mechanism in {beta}-lactamase CTX-M-14. MEROS provides occupancy and ADP mean values with standard deviations that directly quantify the informational content of the experimental diffraction data.
Fromm, S. A.; Mattei, S.
Show abstract
Structure elucidation of biological macromolecules by single particle cryogenic electron microscopy (SPA cryo-EM) or cryogenic electron tomography (cryo-ET) relies on low-dose imaging on cryogenic transmission electron microscopes (cryo-TEMs). Routine microscope setup remains technically demanding and can be time-consuming, particularly for inexperienced or infrequent users. We present LowDoseWizard, a guided workflow implemented in SerialEM that enables rapid and standardised setup of cryo-TEM imaging conditions. From minimal user input, the workflow configures microscope optics, camera parameters and image shift settings for all low-dose imaging states, and guides the user through key daily alignment procedures including beam shift offset calibration, objective lens astigmatism correction and coma-free alignment. The workflow is organised into modular routines that can be executed sequentially or independently, while microscope-specific acquisition parameters are defined in editable configuration files, allowing flexible adaptation to different instruments without modification of the core scripts. Across user sessions on three microscopes at EMBL Heidelberg, the complete setup required on average less than 15 minutes. To assess whether predefined imaging conditions generated by the workflow are compatible with high-resolution data collection, we acquired apoferritin data on a 200 kV Glacios and a 300 kV Titan Krios. These datasets yielded reconstructions at 1.62 [A] and 1.09 [A] resolution, respectively, demonstrating that rapid, guided setup can support near-atomic and atomic-resolution single particle cryo-EM. LowDoseWizard lowers the barrier to robust cryo-TEM setup, reduces the time spent on routine parameter selection and alignment, and helps users focus on sample-specific aspects of data acquisition such as target selection. The workflow should be particularly valuable in shared instrumentation environments, where accessibility, reproducibility and efficient microscope use are critical.
Weinert, T.; Standfuss, J.; Seidel, H. P.
Show abstract
Macromolecular crystallographic refinement underpins structural biology, yet existing software packages often lack accessible, modular codebases amenable to rapid method development. Here, we introduce TorchRef, a PyTorch-based crystallographic refinement framework that exposes all refinable parameters, atomic coordinates, displacement parameters, occupancies, and scale factors to automatic differentiation. The framework implements FFT-based structure-factor calculations, the French-Wilson treatment of intensities, bulk-solvent modeling with established mask parameters, and stereochemical restraints from the CCP4 Monomer Library. A modular target architecture allows loss functions to be combined, weighted, and extended independently of the core refinement machinery. Validation against 1,000 PDB structures demonstrates that TorchRef-based refinement reproduces a median R-free within 1% of Phenix while maintaining comparable model quality. Structure factor calculation in TorchRef scales readily across multiple CPU cores and is over 100 times faster on modern GPUs than CCTBX. To showcase how modern methods like time-resolved crystallography can benefit from the flexibility that TorchRef provides, we implemented direct refinement of a typical time-resolved model against amplitude differences, a use case currently not explored by classic refinement programs. TorchRef is released under the MIT license with full API documentation and tutorials, providing an accessible platform for developing and testing new crystallographic refinement protocols. SynopsisTorchRef is an open-source PyTorch-based crystallographic refinement framework that exposes all refinable parameters to automatic differentiation, delivers GPU-accelerated structure-factor evaluation more than 100x faster than CCTBX, and enables new workflows, such as direct refinement against amplitude differences in time-resolved crystallography.
Scott, L. W.; Perez-Segura, C.; Hadden-Perilla, J.; Zlotnick, A.
Show abstract
In an infection, Hepatitis B Virus (HBV) core protein (HBc) normally assembles into icosahedral capsids. Capsid Assembly Modulators (CAMs) are direct acting antivirals that induce HBc mis-assembly and are the subject of active research and development. Two versions of HBc are used in structural studies of CAM-HBc complexes: Cp150 and Cp149-Y132A. Cp150 forms empty icosahedral capsids that are structurally indistinguishable from those found in virions. The Y132A mutation of Cp149 leads to an assembly defective soluble protein that crystalizes as flat hexagonal sheets, where the hexagons resemble icosahedral quasi-sixfold vertices. In this study, we compare structures of CAM-bound Cp150 to CAM-bound Cp149-Y132A. In capsids, the residues forming the CAM site shift to match the structure of bound CAMs, an induced fit. In Cp149-Y132A crystals, CAM sites show little structural adjustment in response to different CAMs binding. In turn, the array of residues that interact with CAMs varies from CAM to CAM in capsid structures but remains nearly constant in Cp149-Y132A crystals. These results illustrate important differences between CAM binding in Cp149-Y132A and Cp150 structures that will contribute to future CAM design.
Hynönen, M. J.; Venkatesan, R.
Show abstract
Mycobacterium tuberculosis (Mtb), the causative agent of tuberculosis, can use host derived lipids as carbon and energy source for survival. Mammalian cell entry (Mce) associated membrane (Mam) proteins are important for the stability of lipid importing Mce complexes. Mtb has five homologs of Mam proteins referred as orphaned Mam (OmamA-E) proteins. A recent study suggested that OmamC (Rv1363c) is essential for the storage and utilization of lipids under starvation in Mtb. To understand the structure and interactions of OmamC, we generated a truncated soluble variant of OmamC (OmamC129-261). Here, we report on the challenges encountered during the crystallization and structure determination of OmamC129-261 and the strategies applied to overcome them. Despite the AlphaFold2 predicted model proving an initial molecular replacement solution, experimental phasing was necessary to determine the structure of OmamC129-261. Heat treatment of protein prior to crystallization setup removed partially unfolded protein present and played a critical role in enhancing the reproducibility and diffraction quality of OmamC129-261 crystals. Although reported earlier, it is not a widely used method. It is worth to try this method, especially, when faced with poor reproducibility and diffraction of crystals.
Dong, Y.; Yang, Z.; Schneider, M.; Scherzer, O.; Schuetz, G.
Show abstract
We introduce a workflow to identify oligomeric structures that are recorded with single-molecule localization microscopy (SMLM) under cryogenic conditions. Typically, these oligomers are assumed to consist of protomers arranged as equilateral two-dimensional polygons and every protomer is labeled with a dye molecule for visualization. Unlike previous work, we consider scenarios in which the sample plane has an unknown orientation relative to the focal plane. Our contribution is a high-precision plane-fitting algorithm to determine the sample plane, combined with geometrical transformations and two circle-fitting algorithms to identify the oligomeric structures. Our simulations on synthetic data demonstrate that the proposed workflow achieves high accuracy in estimating both the unknown tilted plane and the oligomer size.
Gonda, I.; Junker, D.; Eggimann, F.; Kaech, A.; Szwedziak, P.
Show abstract
Due to recent technological advances, in situ structural cell biology is becoming a high throughput microscopy technique as all the steps of the workflow, from sample preparation to data analysis, are executed faster, more reliable and more reproducible. Sample thinning by cryoFIB-SEM is an essential tool in preparing electron transparent lamellae of biological specimens suitable for further characterization by cryoET. Modern cryoFIB-SEM instruments can be operated remotely and are capable of automated and unsupervised lamellae preparation. To take full advantage of these developments they need a constant supply of LN2 to maintain cryogenic conditions inside the microscope chamber. Here, we introduce a custom automated LN2 refill system that is compatible with gas cooled cryostages, supports long-term cryoFIB-SEM operations and liberates the user from highly repetitive and manual work. We believe this solution can be utilized with other cryoSEM or cryoFIB-SEM devices requiring N2 gas-flow cooling.
Kim, A.-R.; Perrimon, N.
Show abstract
As protein structure prediction tools become widely adopted across biology, there is a growing need for accessible methods to assess and visualize predicted protein-protein interactions (PPIs). Here we present LIVIA (Local Interaction Visualization and Analysis), a browser-based tool that computes local PPI confidence metrics across multiple prediction platforms, identifies predicted interface residues, embeds an interactive Mol* 3D viewer, and generates visualization scripts for ChimeraX and PyMOL. The tool automatically detects prediction formats; all parsing and computation occur locally on the users machine. LIVIA is freely available at https://flyark.github.io/LIVIA.
Joachimiak, A.; Tan, K.; O'Connor, K. A.; Zhou, X.; Gade, P.; Garcia, E.; Tan, A.; Nijhawan, A.; Endres, M.; Kim, Y.; Greenwood-Quaintance, K.; Patel, R.
Show abstract
Serine-aspartate repeat-containing protein D (SdrD) is a Staphylococcus aureus cell wall-anchored, calcium-binding adhesin member of the MSCRAMM Sdr subfamily that may contribute to bacterial adhesion and virulence. S. aureus is the most common cause of periprosthetic joint infection (PJI). Population-level distribution and sequence diversity of SdrD among clinical PJI isolates have not been systematically characterized, and the SdrD binding mechanism is still not well understood. To address these gaps, sdrD alleles were queried across 156 newly sequenced PJI isolates and compared to publicly available S. aureus genomes, and nucleotide- and protein-level phylogenies of the sdrCDE locus constructed. The SdrD crystal structure from S. aureus JH1 was determined, with solution small-angle X-ray scattering (SAXS) and molecular dynamics (MD) simulations, and assessment of conformational changes with calcium depletion. Three dominant sdrD subtypes were defined, associating with USA300, JH1, and TCH60; the JH1 sdrD subtype was predominant among PJI isolates. Structural studies showed that the conformation of individual domains and interdomain organization of the multidomain SdrD have limited flexibility in solution, and that the calcium-binding B domain retains its core fold under conditions of calcium depletion. Together, the findings presented support functional diversification among Sdr family members in mediating host attachment and inform a re-evaluation of the ligand-binding mechanism previously proposed for SdrD. AUTHOR SUMMARYStaphylococcus aureus is the leading cause of infections that develop around joint implants (periprosthetic joint infection, PJI). This bacterium has a large arsenal of surface proteins that allow it to stick to human tissues and implanted devices. This work focused on one such protein, SdrD, which has been linked to implant-associated infections but the structure and diversity of which among patients with PJI had not been well characterized. The genetic sequences of SdrD were analyzed across thousands of bacterial genomes, including those from patients with PJI. Distinct genetic variants of the protein were found, one of which was particularly common with PJI. The three-dimensional structure of SdrD was determined at atomic resolution and solution small-angle X-ray scattering (SAXS) and molecular dynamics used to study how it moves and responds to changes in its environment. Contrary to what was previously described, SdrD was shown to be relatively rigid. These findings change how SdrDs mechanism of action should be considered, potentially informing design strategies to block bacterial attachment before infection takes hold.
Shahid, S.; Lundin, D.; Rozman Grinberg, I.; Sjöberg, B.-M.
Show abstract
The prevalent transcriptional repressor NrdR binds to highly conserved prokaryotic sequences in the promoter regions of operons encoding the essential enzyme ribonucleotide reductase. The NrdR binding sites consist of two partially palindromic 16 bp sequences (NrdR boxes) separated by a 15-16 bp linker sequence. We have assessed the requirement of both boxes for binding, the propensity of different NrdRs to bind to heterologous binding sites, and that the linker sequence is only limited to length and not sequence conservation. As we have observed several deviations from the conserved sequences of the NrdR boxes, we here test the conservation requirements of individual basepairs in the NrdR boxes using a synthetic DNA fragment (Synt DNA) to which the NrdR proteins from the actinomycete Streptomyces coelicolor and the gammaproteobacterium Escherichia coli bind equally well as to their homologous binding sites. By introducing isolated mutations to Synt DNA and testing the binding capacity of NrdR from S. coelicolor and E. coli we expand our understanding of what criteria are needed to build a functional binding site for the NrdR repressor.
Zafiropoulo, H. R.; Thomas, J. E.; Cortez, N. R.; Apostol, K.; de Sa, A.; Khosravi, R.; Moore, L.; Berndsen, C. E.; Bibel, B.
Show abstract
Species of Bacillus bacteria including Bacillus safensis and Bacillus subtilis are finding increasing uses in biotechnology and bioremediation, thanks in part to their metabolic robustness. Malate dehydrogenase (MDH) is at the heart of central metabolism and thus a better understanding of Bacillus MDH proteins could aid in the optimization of these applications. MDH of Bacillus spp. belong to the lactate dehydrogenase (LDH)-like class of MDHs, otherwise known as the MDH3 class. Despite wide prevalence in nature among prokaryotes and archaea, this typically homotetrameric class is understudied compared to the MDH1 and MDH2 classes found in eukaryotes. We therefore recombinantly expressed and purified MDH proteins from two societally relevant Bacillus spp.-B. safensis and B. subtilis-and characterized them biophysically (via Size Exclusion Chromatography-Small Angle X-ray Scattering (SEC-SAXS) and Differential Scanning Fluorimetry (DSF)) and enzymatically (via spectroscopic activity assays). As expected based on their high sequence identity, the two MDH orthologs had similar properties in most regards, including a tetrameric structure and high susceptibility to substrate inhibition. However, we uncovered differences in conditional thermal stability, in addition to subtle differences in enzymatic activity that offer insight into the workings of LDH-like MDH. Summary statementMalate dehydrogenase (MDH) is a fundamental metabolic enzyme, from microbes to mammals, yet comparably little is known about microbial MDH, especially MDH of the tetrameric MDH3 class. We compare the biophysical and enzymatic properties of two such enzymes from the societally relevant bacterial species Bacillus subtilis and Bacillus safensis, offering useful insight with potential biotechnological implications.
Guo, X.
Show abstract
Building and refining cryo-EM atomic models often requires long, project-specific workflows that combine map inspection, prior structural knowledge, restraints, refinement, validation and expert review. Existing programs perform many individual operations, but coordinating them across iterative model-building sessions remains manual and difficult to audit. We present StructAgent, a user-guided multi-agent resource for cryo-EM model building and refinement. StructAgent couples a domain agent for literature-grounded structural reasoning with an execution agent that runs local software, tracks state, recovers from failures and records provenance. Expert approval gates control major model-changing actions. In three case studies, StructAgent refitted a 64-chain proteasome from an earlier template, audited 530 ribosomal metal-ion sites and guided a chemically ambiguous ligand fit in a folate-metabolism enzyme from ongoing work. These demonstrations show that agentic orchestration can convert modeling intent into auditable, reviewable software workflows while preserving expert control and final scientific judgment.
Kinman, L. F.; Grassetti, A. V.; Carreira, M. V.; Davis, J. H.
Show abstract
The emergence of single-particle cryoEM as a powerful method for structure determination has in large part been fueled by its ability to resolve both single static structures and complex conformational landscapes. Indeed, modern approaches to the heterogeneous reconstruction task can resolve 100s-1,000s of different maps from a single cryoEM dataset. How accurate these algorithms are, however, has proven difficult to rigorously assess, due to a lack of suitable benchmark datasets containing both realistic noise features and ground-truth labels. To address this obstacle, we recently developed a series of benchmark datasets that leverage the targeting power of Cas9 and the programmable heterogeneity of DNA to newly offer access to ground-truth per-particle structural labels in real data. Here, we challenged two popular heterogeneous reconstruction algorithms with mixed particle stacks resampled in silico from these datasets, finding that existing approaches resolve the encoded heterogeneity with limited accuracy. In particular, in realistic particle stacks with complex, multi-scale, and multi-axis heterogeneity, we observed that reconstruction of encoded heterogeneity depended strongly on the application of prior information about where heterogeneity was expected, and that individual particle assignments were made with significant error even when the correct structural states were reconstructed. Both molecular breathing motions and data collection features, such as defocus and projection angle, contributed to the observed particle assignment error. These results highlight important shortcomings of existing heterogeneous reconstruction methods and suggest new avenues for method development in both data collection strategies and in heterogeneous classification and reconstruction algorithms.
Liu, Y.; Lee, K.-Y.; He, Y.; Kim, D.; Chang, H.; Cherezov, V.; Feigon, J.; Qin, P. Z.
Show abstract
Double-stranded DNA minicircles have been observed in a variety of biological settings and are also widely employed in biotechnology, therapeutic applications, and basic research. Here, we report a cryo-EM structure of a 95-basepair minicircle (dsMC95) at a 5.3 [A] resolution. dsMC95 forms a closed ring as designed and no local deformation is observed. The two DNA strands are fully resolved, with the major and minor grooves clearly distinguishable. Analysis reveals a nine-fold periodicity in the helical twist, which corresponds to approximately 10.56 base pairs per turn. Together with groove width analysis, the data indicate that dsMC95 maintains a B-DNA configuration. The dsMC95 ring exhibits an in-plane ellipticity of 1.13 and an out-of-plane displacement of 15{degrees}, with differences in out-of-plane displacements observed between the two half-segments. The dsMC95 structure, which is the only free DNA cryo-EM structure with a resolution better than 6 [A] to date, allows comparison to other structures to better understand DNA physical features such as bending. The findings advance our understanding of DNA structure under topological constraints and may inform studies of naturally occurring small circular DNA as well as the manipulation of DNA in nanotechnology applications.
Fieux-Castagnet, A.; Waton, J.; Glukhonemykh, A.; Snow, E.; Ashokkumar, R.; Fleming, J.; Champagne, D.; Devenyns, T.; Peluffo, A.; Anagnostopoulos, C.
Show abstract
Protein structure prediction models (such as AlphaFold, Chai, Boltz) have transformed structural biology and are increasingly explored for drug discovery; however, their utility for large-scale screening of antibody-antigen (AB-AG) interactions remains unclear, particularly for distinguishing true binding from non-binding pairs at scale. To our knowledge, there has not been an exhaustive exploration of Boltz-2 inference settings on this high impact problem, and in this paper we set out to describe and implement a novel benchmarking framework that can accelerate progress in the field. We evaluated Boltz-2 (NVIDIA NIM implementation) on 519 therapeutic monoclonal antibodies from Thera-SAbDab, pairing each antibody with its cognate target and a randomly assigned non-cognate antigen. We developed a novel evaluation framework that systematically captures variability across stochastic seeds while benchmarking different inference settings, including datasets with and without crystallographically resolved antibody structures. Across settings, Boltz-2-derived confidence metrics showed weak, though above-chance, discrimination (0.5 < ROC-AUC < 0.60). Among evaluated metrics, the minimum value of the interface predicted TM-score (ipTM-min) across seed-samples, captured the strongest signal. Interestingly, additional feature aggregation and multivariate modelling provided little to no improvement. Increasing the number of stochastic predictions yielded front-loaded gains, with diminishing returns beyond [~]15-20 seed-samples, suggesting limited value of extensive sampling in practical workflows. Notably, inference without multiple sequence alignments (MSAs) slightly improved performance on non-crystallized antibodies ({Delta}AUROC {approx} +0.027) while reducing runtime by [~]8 seconds per prediction compared to shallow MSA settings. Overall, these results indicate that off-the-shelf confidence metrics from general-purpose structure prediction models may be insufficient for reliable target-antibody screening and highlight the need for task-specific optimization, while confirming that modest amounts of sampling can be helpful, but not in itself sufficient to improve performance significantly as gains plateau relatively quickly.
Qian, J.; Gong, Y.; Liu, F.; Huang, Y.; Guo, G.; Zhu, Y.; Huang, Q.
Show abstract
Accurate particle picking from noisy cryo-EM micrographs is essential for high-resolution reconstruction. Current deep learning methods rely on manually annotated data, which is labor-intensive, subjective, and limits particle recall under low signal-to-noise ratio (SNR). Here we introduce ParSeek, an automated picker trained entirely on synthetic data without human annotation. Synthetic micrographs are generated by projecting known 3D structures into realistic background patches that reproduce experimental noise. On seven public cryo-EM datasets, ParSeek outperformed Topaz and CryoSegNet on four datasets, achieving the highest F1-score (up to 0.82) and reaching 0.63 on a challenging membrane protein dataset. Density maps from ParSeek-picked particles showed cross-correlation coefficients up to 0.995 with the reference and a minimal resolution difference of 0.1 [A]. ParSeek also overcame severe orientation bias on an influenza dataset, yielding a reasonable reconstruction. Applied to three experimental datasets (an antibody-antigen complex and two GPCRs), ParSeek enabled reconstructions at 5.0 [A], 4.0 [A], and 2.8 [A], respectively. The 2.8 [A] map resolved side-chain densities and ligand flexibility. This study establishes a fully synthetic-data-driven strategy that eliminates manual annotation for training cryo-EM deep-learning models, paving the way for automated, unbiased particle picking.
GRIGORIADIS, I.
Show abstract
Computer-aided drug design for conditional biomolecular interfaces requires evaluation across more than one receptor structure, docking pose, or scalar score. LINE-1 ORF1p is treated here as a state-family interface target whose relevant behavior is distributed across receptor microstates, assembly-compatible contact neighborhoods, ligand conformers, and perturbation snapshots. This article presents Linobectide as a mathematical-chemistry CADD workflow centered on a modified black-hole algorithm (MBHA) for persistence-weighted prioritization of putative ORF1p inhibitor candidates. Each molecule is represented as a dossier containing standardized descriptors, docking annotations, interaction-class persistence vectors, finite-action stability traces, graph-localization summaries, SPECTRAL-SAR applicability-domain records, and rank-shift diagnostics. The revised analysis emphasizes numerical reporting endpoints: fixed run parameters, baseline comparators, ablation metrics, rank stability, regeneration fractions, protected-elite fractions, and reproducibility indices. Docking is used as an annotation layer rather than as a stand-alone proof of inhibition. The framework is therefore reported as a transparent computational prioritization protocol that generates testable hypotheses for future biochemical and cellular validation, not as experimental proof of ORF1p inhibition or therapeutic activity. Author summaryDrug-design workflows can become over-dependent on the best docking pose even when an interface target remains functional through alternative contact corridors. Linobectide addresses this issue by ranking candidates only after docking annotations are aggregated across receptor-state and perturbation conditions. The MBHA search promotes a candidate when interaction persistence, finite-action stability, graph localization, SPECTRAL-SAR coherence, applicability-domain support, and reproducibility checks are concordant. The revision removes unsupported claims of performance advantage and replaces them with benchmarkable endpoints that can be compared with docking-only, consensus-docking, and ablated MBHA baselines. The SI Appendix is retained as a figure atlas for state-family construction, graph-localization diagnostics, docking provenance, consensus geometry, and comparative triage.
Florez, I.; Farhat, A.; Le Houx, J.; Altamura, E.; Tozzi, G.
Show abstract
Quantum kernel methods offer a potential advantage for classification tasks in high-dimensional feature spaces, yet their practical benefit critically depends on how input features are prepared. We compare five dimensionality reduction strategies--principal component analysis (PCA), Gaussian random projection (RP Gaussian), sparse random projection (RP Sparse), partial least squares (PLS), and uniform manifold approximation and projection (UMAP) -- as pre-processing steps for quantum kernel support vector machines (SVMs) applied to trabecular bone classification from synthetic micro-computed tomography (micro-CT) data. Using a custom procedural generator based on Gaussian random field zero-crossings, we produced 500 synthetic trabecular bone volumes with controlled morphometric properties such as bone volume fraction (BV/TV), trabecular thickness (Tb.Th), number (Tb.N) and spacing (Tb.Sp). Texture features extracted from grayscale slices are reduced to 8-dimensional quantum circuit inputs via each method, then classified using both classical radial basis function (RBF)-SVMs and quantum kernel SVMs with ZZ feature maps on a statevector simulator, both evaluated with 5 x 5 repeated stratified cross-validation (25 folds). Our results show that UMAP is the only reduction method where the quantum kernel remains competitive with the classical baseline. Under repeated cross-validation, UMAP showed a +0.032 accuracy gap favouring the quantum kernel (Dietterich 5 x 2 CV p = 0.177); however, validation on 10 fully independent datasets--each with independently generated samples, separate reduction fits, and separate kernel matrices -- reversed the sign to -0.030 (paired t-test p = 0.123; Wilcoxon p = 0.193; quantum wins 3/10 datasets), indicating that the apparent advantage was likely inflated by fold dependence. Nevertheless, UMAPs gap remains small and non-significant in both analyses, whereas all linear methods (PCA, RP Gaussian, PLS) show substantial quantum deficits of -0.090 to -0.116 across BV/TV classification, with PCA and PLS remaining significant under corrected tests (5 x 2 CV p = 0.004 and p = 0.007 respectively). We additionally evaluate quantum kernel ridge regression for continuous morphometric prediction, finding that ZZ quantum kernels fail uniformly at regression (negative R2 for all methods except PLS at 4 qubits), suggesting that the ZZ kernel captures decision boundaries but not smooth metric structure. These findings provide practical guidance for feature engineering in near-term quantum machine learning pipelines and demonstrate that the choice of dimensionality reduction can determine whether quantum kernels remain competitive with classical baselines.
Dilip, R.; Qu, S. J.; Chen, Z.; Van Valen, D. A.
Show abstract
Structural cell biology aims to visualize and identify functional molecules in their native environment. Macromolecular complexes have thus far been resolved predominantly at intermediate resolutions. This poses a major challenge for modeling due to the vast combinatorial space of possible components within a proteome. Here, we developed Cryosearch, a system for automated modeling of macromolecular complexes from proteome-scale monomer libraries. We implemented Monte Carlo tree search with correlation-based rewards to identify combinations of protein domains that collectively best explain a density map. This approach enabled autonomous de novo assembly of molecular complexes from intermediate-resolution maps, a task that has been difficult to perform manually.
So-Last, M. G. F.; Hale, T.; Burt, A.; Allegretti, M.
Show abstract
Cellular cryo-electron tomography (cryo-ET) reveals high-resolution details of macromolecules within their native cellular environment. However, in situ cryo-ET datasets are large and highly heterogeneous, which makes comprehensive identification and extraction of the many different elements of cellular architecture for high-resolution analysis a challenging, time-consuming and often tedious task. Here we present easymode, a library of pretrained general segmentation networks for cryo-ET, trained on over 4,000 tilt series spanning a large and diverse variety of sources. Easymode enables in situ structural determination workflows by rendering tomogram content computationally accessible, without requiring any per-dataset training. Beyond directly facilitating high-resolution subtomogram averaging of a selection of widely prevalent complexes, we show how easymode can be used to leverage cellular context in subtomogram averaging workflows, helping identify, align, or filter particle sets, and enabling automated mapping of the cellular landscape surrounding target proteins. We use easymode to determine the in situ structure of rare inosine monophosphate dehydrogenase (IMPDH) filaments at 4.0 A resolution, and to map and visualize the surrounding cellular environment.